aa.PNG

Introduction

This article is about the network analysis for player transfers within the website "www.transfermarkt.it". The goal of this report is to represent through Network Analysis the market transfers of the last 30 years of the top 6 leagues in Europe, followed by an in-depth look at the Italian leagues, asking questions that would help a sports agent to meet the demands of his clients. This topic was chosen because, while network analysis is often used to analyse data from football matches, it has rarely been used to analyse market transfers. The few studies or theses found online only aim to describe the development of past transfers, without focusing on possible uses to help football agents.

The study was conducted by considering both the number of transfers and economic flows.

We chose the Transfermarkt site because it gives us a lot of information for free, which is great for network analysis.

Besides creating a script to extract the data, we made use of the information on the repository: https://github.com/emordonez/Transfermarkt.

Barra-Nera.png

Since football agent already have data on matches and players available through paid sites, they wanted to provide an additional tool, that of the football transfer market, to improve both a player's career and fooball agent's monetary income.

We have proceeded by asking certain questions that will be summarized in these points and then discussed further:

First Part

1) What are the top teams, leagues and connections between them of the last 30 years and how have they varied each decade in terms of the number of trades in the 6 leagues? And which one plays a central role?

2) Which top European teams move more money between them? And which teams play a central role?

3) What roles and ages are bought the most in these leagues in terms of number of transfers and monetary exchanges?

Second Part

1) Which are the most active teams in Italy in term of number of tranfers from 2010?

2) What is the best minimal path from team A to a team B, in terms of number of weighted transfers? What is the best minimal path from team A to team B that provides the highest monetary revenue?

3) What nationalities of players do certain clubs prefer from 2010 in term of tranfer's number?

Barra-Nera.png

Program language

The project was carried out using the Python programming language, while the IDE used was PyCharm, which also allowed us to work on Jupyter.

The most used Python libraries for the analysis are:

Pandas --> manage and organize data with databases or Dataframes.

Networkx --> building the graph structure.

Matplotlib --> graphical part.

Plotly --> creating the dynamic graph.

Sklearn --> tools for predictive data analysis

Numpy --> helps to analyze large matrices and multidimensional arrays.

Barra-Nera.png

Web Scraping - Data cleansing - Merging datasets

Web Scraping

Data scraping is used to extract data from websites. In our case we extracted data from www.transfermarkt.it. This is a German website that contains information such as rankings, results, transfers, players' careers and football club data.

Leghe.PNG

Transfers of players from the 1990/1991 football season to the 2020/2021 football season have been downloaded. For Italy, data from Serie B from 2010/2021 were also downloaded.

Data cleansing

The data cleansing process can be found throughout the code. The most important changes concerned:

Merging datasets

The datasets provided to operate this notebook are the union of several datasets extracted with the procedures described above.

Barra-Nera.png

Analysis

The analysis has been divided into 2 parts

It is important to remember that the football market is the set of contractual negotiations that define the transfer of a player from one club to another. Clubs can only carry out transfer operations in two windows: one during the summer and the other in January. (However, no distinction is made in this draft).

( https://it.wikipedia.org/wiki/Calciomercato).

Barra-Nera.png

FIRST PART

Question 1

Acting as an advisor to a football agent or sporting director of a football club, it was decided to look back over the last three decades of the football market in the six leagues mentioned above, in order to provide a history and observe the most important teams (nodes) and transfers (arcs).

Three different datasets were examined, one per decade. It was decided to print the graph of all football market transactions on the screen.

In the graph '90-'99 we note in order how Benfica, Inter, Olympique Marseille and Torino Calcio were the teams with the most transfers from 1990 to 1999. We note that teams from the German, English and Spanish leagues do not yet have any major nodes. Also in this first graph we can see an important arc between Bayer 04 Leverkusen and Fortuna Düsseldorf. We notice how in this period there is an important division between the various leagues, almost creating a cluster. An important arc between leagues is that between Juventus and Borussia Dortmund.

In the graph 2000-2009 we notice in order how Inter, Olympique Marseille, Liverpool and Benfica are the teams with the most transfers. We notice how the teams connect more with teams in the same league, but compared to the previous decade we notice more pronounced arcs.

In the last decade analysed we notice how Serie A and La Liga become very active. However, we notice that teams such as Porto, Lisbon and Benfica, belonging to the LIGA NOS, become very important regarding Degree Centrality. It is interesting to observe how it is clearly visible the connection of teams belonging to different leagues but belonging to the same family, in this case the Pozzo family with Udinese, Granada and Watford.

We have got the white color for edges until 9 links, yellow from 10 to 14, orange from 15 to 13 and red from 24 to max. Node size based on number of transfers, node color based on degree.

In order to better show the connections between the various leagues, it was decided to construct a graph that only connected them.

-Looking at the graph '90-'99 we notice that the LIGA NOS and the BUNDESLIGA do not have an arc connecting them. This shows that there have been no changes between these leagues.

-Looking at the graph from 2000-2009 we can see that the LIGA NOS and the BUNDESLIGA in this case start to be connected but in a very slight way. The connection between Ligue 1 and Liga intensifies with the Premier League becoming a central node. This can also be seen in the positioning within the graph.

-In the last graph, period 2009-2021, we notice that in the last decade the number of transfers has increased considerably. We can see this by the redder colour of the links and by the thicker stroke.

We have got the yellow color for edges until 100 links, orange from 101 to 200, red from 201 to max.

BETWEENNEESS CENTRALITY

In graph theory, betweenness centrality (or "betweeness centrality") is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through (for unweighted graphs) or the sum of the weights of the edges (for weighted graphs) is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex. For example, in a telecommunications network, a node with higher betweenness centrality would have more control over the network, because more information will pass through that node

We wanted to use it to understand which teams were the most present among the exchanges and therefore those playing a central role.

-In the period 1990-1999 we note that two teams from the LIGA NOS Benfica and Lisbon are the ones with the highest betweenness value followed by Eintracht Frankfurt.

-In the decade 2000-2009 we observe that Benfica and Lisbon remain in first and second place, followed by Udinese.

-In the last decade analyzed, 2010-2019 we note that there is a clear change, in fact Genoa and Parma become teams with a higher degree of betweenness.

Question 2

Acting as an advisor to a football agent or a sports director of a football club, we decided to look at the last decade 2010-2021 of the football market of the 6 leagues mentioned above. This time, however, we will focus on an analysis of monetary transfers.

In this graph we can see how Juventus is a key node for the exchange of money between Italy and Europe. One of the central nodes is definitely Barcelona who are among the teams that spin the most money, with PSG and Liverpool being linked with FC Southampton, which is the team where they drew the most, as players, brought the UEFA Champions League in 2019. We also note that nodes/teams Chelsea, Barcelona, Juventus and Manchester United are the ones with the highest degree.

We have got the light-blue color for edges until 150 mln exchanged, blue from 150 mln to 200 mln and black color from 200 mln to max.

BETWEENNEESS CENTRALITY is a measure of centrality in a graph, based on the shortest paths. We wanted to use it to understand which teams were the most present among the exchanges and therefore those playing a central role.
Betweenness Centrality

A measure of centrality in a graph based on shortest paths. In the last 10 years Juventus, based on the monetary amount exchanged had the highest betweenness with 0.30,it is evident from the graph that Juventus is the team that moves more money between Italy and Europe. Then we have Barcelona, Manchester United and Chelsea.

Eigen Vector Centrality

Question 3

Always acting as an advisor to a football agent or a sporting director of a football club, the idea was to observe the movements of the football market by combining position and age of the player, to see how the flow between the leagues varied.

Network Analysis Age-Position leagues by number of transfers 2010 -2021

Looking at the graph we notice that the Premier League stands out as the league with the highest number of transfers of all years of very young goalkeepers, defenders and midfielders between 13 and 22. Serie A, on the other hand, is very connected with older players. La Liga NOS is the league with the highest number of transfers of middle aged players. The Bundesliga, La Liga and Ligue 1, on the other hand, do not have transfers with the various combinations but do not appear in the graph.

It has been decided to leave only the most important links for each individual combination to simplify the display.

Thickness according to the number of tranfers.

Network Analysis Age-Position leagues for monetary exchanges 2010 -2021

From the top graph we see that the Premier League is the league with the most transfer spend for different combinations of players between age and position. In particular very young defenders and midfielders. Serie A, on the other hand, stands out for the number of purchases of older players. It is worth noting that the Bundesliga is homogeneous in terms of spending, so much so that it is not even displayed for an important link. The Liga NOS spends a lot on goalkeepers between 18-20 years old and the Liga 1 on strikers between 36 and 42 years old.

It has been decided to leave only the most important links for each individual combination to simplify the display.

Thickness according to the money exchanged.

Barra-Nera.png

SECOND PART

Question 1

Always acting as an advisor to a football agent or a sports director of a football club, we decided to look at the last decade 2010-2021 of the football market of the Italian professional leagues, i.e. SERIE A, SERIE B, SERIE C.

In the graph representing the football market transfers of the last 11 years, we notice that the nodes with the most transfers are: Parma, Atalanta and Chievo Verona. We also note how the connection between Lazio and Salernitana is very marked. This is because they belong to the same management, headed by President Lotito.
There is also a very strong connection between Cagliari and Olbia, because over the last ten years the help between the two island clubs has intensified. Also because of the friendship between the two club presidents. Among the most important links are Parma, Crotone and Gubbio.

We have got the white color for edges until 4 links, yellow from 5 to 18, orange from 18 to 25 and red from 25 to max. Node size based on number of transfers, node color based on degree.

Question 2

Also on the idea of helping a football agent, it was thought to construct a possible metric that could indicate the 3 best routes to bring a player to a top club. This was thought to be implemented for both transfer quantities and monetary quantities

We have a young player from Olbia who wants to join Juventus. What path could favour this dream? The highest quantitative metric of a player starting from an A team to get to a B team, calculated using the number of transfers of minimal paths. The teams in order are: Cagliari, Genoa, Atalanta.

We have a young player from Olbia who wants to join Juventus. What path would a prosecutor prefer according to a logic of monetary exchange value? Transfers with a higher economic value bring more income in terms of % to the agents. That is why they would also be interested in using this metric.

Barra-Nera.png

Question 3

The idea was to advise a player's agent by showing him or her how the chances of a player of nationality X reaching a top European club vary (for the sake of simplicity only certain clubs and nationalities will be used in the analysis).

History teaches us that certain clubs are more likely to select players of certain nationalities). This factor if properly used could facilitate buying and selling.

In our example, we looked at 4 teams Inter, Milan, Juventus, and Cagliari Calcio, all teams in the Italian Serie A.

And 6, from the most frequent countries of origin: France, Argentina, Brazil, Uruguay, England, and Spain.

We can see that Brazil is a source of players to bring to Italy, and the team that has relied most on Brazilian players is Inter and Milan. Also, many Argentine players are bought by clubs located in Milan. We also note that England (the country where football was born) has no particular match in the top Italian clubs and Cagliari.

We also note that Inter has not bought any Spanish players in recent years. From this analysis, we can see that South American players are more sought after in Italy.

As mentioned above, we have only taken 4 teams for the sake of simplicity, it would be interesting to extend the analysis to all teams and all other leagues. It would be interesting to extend the analysis to all teams and all other leagues to observe the diversity according to the nationality of the player.

Barra-Nera.png

Conclusion Analysis 1

Thanks to the analysis carried out we have seen that in the last 10 years, the Premier League has come to play a central role, both in the number of transfers and in the monetary amounts exchanged. It is also the league that focuses more on young players in every role of the field.

Serie A remains a very active league as far as player exchanges are concerned, with a prevalence of age groups over 30 compared to the others. This means that the monetary value of transfers is lower because after this age the player tends to lose value.

The Liga NOS during the last 20 years has grown a lot in terms of number of transfers with particular attention to those of transfers with particular attention to those in the age group 21-28. An agent would find it easier to bring a player to Portugal if the player is in mid-career or has a good career, if he is in mid-career or needs a second chance.

La Liga is a less active league as far as the number of transfers is concerned, while it has been noticed that it spends more on players than on the rest of the league. Noted that it spends on established players of each role, also explained by the presence of two clubs to which each player is assigned. This is also explained by the presence of two clubs to which each player aspires, Real Madrid and Barcelona.

Bundesliga and Ligue 1 are leagues where no particular significance has been found particular significance as they are much less active than the other leagues.

Conclusion Analysis 2

As far as the Italian series is concerned, it was easy to observe that the teams that had more transfers or a higher beetwenness were not those considered top clubs.

The metrics we have created can also be useful to a procurator to understand which is the best path for the player to follow and which path offers the best commissions to the attorney himself. All this if properly combined or supported by a knowledge of the sporting leaderships of the various societies.

Conclusion Analysis 3

There are many studies on network analysis for individual match data, but not for market transfers. This study provides a foundation for future work that can be explored further on other aspects.

Through the integration of paid data from platforms such as WyScout or Infront Sports & Media, the work that has been done could be expanded and integrated with data from players on the field.

This project, has examined only 6 leagues of a continent, for a future study it would be interesting to show also the leagues not taken into account, or make the focus for the various national teams.

The soccer market is a broad topic that could be explored more. We close this analysis with the regret that it was not possible to analyze the data of the
in fact, there is currently only one site in German that contains partial data
contains partial data on the transfer flows of the German women's league.

Bibliography

Sitography

Thanksgiving:

For the help in deepening the topics.

⚽🌐

Maggio 2021 - Cagliari